feat: add Sandbox abstraction for agent code execution environments by agent-of-mkmeral · Pull Request #1968 · strands-agents/sdk-python

agent-of-mkmeral · 2026-03-25T16:08:07Z

Summary

Add the Sandbox interface that decouples tool logic from where code runs. Tools that need to execute code or access a filesystem receive a Sandbox instead of managing their own execution, enabling portability across local, Docker, and cloud environments.

Related: Design Doc PR #681

What Changed

New Files (4 source + 5 test)

File	Lines	Purpose
`src/strands/sandbox/__init__.py`	22	Module exports
`src/strands/sandbox/base.py`	263	`Sandbox` ABC with streaming `AsyncGenerator` interface, `ExecutionResult`, convenience methods (`read_file`, `write_file`, `list_files`, `execute_code`), auto-start lifecycle
`src/strands/sandbox/local.py`	161	`LocalSandbox` — native file I/O overrides, asyncio subprocess execution
`src/strands/sandbox/docker.py`	354	`DockerSandbox` — container lifecycle, stdin piping for safe file writes, `docker exec` streaming
`tests/strands/sandbox/test_base.py`	239	22 tests for ABC, ExecutionResult, convenience methods, lifecycle, edge cases
`tests/strands/sandbox/test_local.py`	201	17 tests for real command execution, file I/O, streaming, timeouts, code execution
`tests/strands/sandbox/test_docker.py`	314	18 tests for Docker lifecycle, container management, file ops (all mocked)
`tests/strands/sandbox/test_agent_sandbox.py`	49	6 tests for Agent integration

Modified Files (2)

File	Change
`src/strands/__init__.py`	Added exports for `Sandbox`, `LocalSandbox`, `DockerSandbox`, `ExecutionResult`
`src/strands/agent/agent.py`	Added `sandbox` parameter to `Agent.__init__`, defaults to `LocalSandbox()`

Design Decisions

✅ IN core SDK

Sandbox ABC — the interface contract
LocalSandbox — zero-dependency host execution (default)
DockerSandbox — containerized execution
ExecutionResult dataclass
Agent integration (sandbox parameter)

❌ NOT in core SDK (for now)

AgentCoreSandbox — External dependency on bedrock-agentcore package. Should live in a separate package
Sandbox tools (run_command, python_tool, programmatic_tool_caller) — These are tools, not core abstractions. They belong in strands-agents/tools
CodeActPlugin — P2 in the design doc, not yet ready

Rationale: The core SDK should provide the minimal, zero-external-dependency abstraction. Tools and cloud-specific sandbox implementations have their own dependency graphs and release cycles.

Key Design Details

Streaming Interface

async for chunk in sandbox.execute("echo hello"):
    if isinstance(chunk, str):
        print(chunk, end="")  # stream output as it arrives
    elif isinstance(chunk, ExecutionResult):
        print(f"Exit: {chunk.exit_code}")  # final result

Agent Integration

from strands import Agent
from strands.sandbox import DockerSandbox

agent = Agent(sandbox=DockerSandbox(image="python:3.12-slim"))
# Tools access via: tool_context.agent.sandbox

Security

Heredoc injection → randomized secrets.token_hex(8) delimiter
Path injection → shlex.quote() for all paths
Shell escaping → shlex.quote() for code arguments
DockerSandbox → stdin piping instead of heredoc for file writes

Tests

76 new tests, all passing ✅
1686 existing tests still passing, 0 regressions ✅

What's Left (Outside This PR)

AgentCoreSandbox — needs its own package with bedrock-agentcore dependency
Sandbox tools — should go to strands-agents/tools
CodeActPlugin — P2 per design doc
Docker integration tests — need Docker in CI

cc @mkmeral

Add the Sandbox interface that decouples tool logic from where code runs. Tools that need to execute code or access a filesystem receive a Sandbox instead of managing their own execution, enabling portability across local, Docker, and cloud environments. Core components: - Sandbox ABC with streaming AsyncGenerator interface (base.py) - LocalSandbox for host-process execution via asyncio subprocesses (local.py) - DockerSandbox for containerized execution via docker exec (docker.py) - Agent integration: sandbox parameter on Agent.__init__, defaults to LocalSandbox Key design decisions: - Only core abstractions in SDK; AgentCoreSandbox and sandbox tools belong in separate packages (external dependencies, different release cycles) - Streaming output via AsyncGenerator[str | ExecutionResult] - yields lines as they arrive, ExecutionResult as the final yield - Security: randomized heredoc delimiters, shlex.quote for all paths, stdin piping in DockerSandbox to prevent injection - Auto-start lifecycle: sandbox starts on first execute() call - Zero external dependencies for core sandbox package Tests: 76 new tests (base: 22, local: 17, docker: 18, agent: 6, + shared) All 76 sandbox tests passing, 1686 existing tests still passing.

agent-of-mkmeral

Adversarial Testing Result: FAIL — 3 bugs found (1 security, 1 crash, 1 code correctness)

Scope: Full adversarial testing of Sandbox abstraction (base.py, local.py, docker.py, agent integration). Tested shell injection, path traversal, heredoc injection, concurrency, lifecycle edge cases, timeout cleanup, type safety, and contract verification.
Tests written: 67 (61 adversarial + 6 confirmation)
Tests passing: 63
Tests failing (findings): 4 (3 genuine bugs, 1 test infrastructure)

Findings Summary

#	Category	Severity	Description
1	Security Issue	Critical	`execute_code()` language parameter injection — unsanitized
2	Bug	High	Long output lines (>64KB) crash `readline()` with `ValueError`
3	Bug	Medium	Non-existent working directory raises unhandled `FileNotFoundError`
4	Code Smell	Low	`proc.returncode or 0` masks `None` as `0` (false success)

Finding 1: `execute_code()` language parameter injection

Category: Security Issue
Severity: Critical

Reproduction:

sandbox = LocalSandbox(working_dir="/tmp")
marker = "/tmp/injection_marker.txt"
malicious_language = f"touch {marker} #"
await sandbox._execute_code_to_result("print('safe')", language=malicious_language)
assert os.path.exists(marker)  # True — injection succeeded!

Observed behavior: The language parameter is interpolated directly into the shell command without sanitization: f"{language} -c {shlex.quote(code)}". Code is properly shlex.quote()'d, but language is NOT. Any caller passing untrusted input as language enables arbitrary command execution.

Expected behavior: language should be validated against an allowlist (e.g., python, ruby, node) or at minimum shlex.quote()'d.

Fix: Either shlex.quote(language) or validate against an allowlist:

# Option A: Quote it
async for chunk in self.execute(f"{shlex.quote(language)} -c {shlex.quote(code)}", timeout=timeout):
    yield chunk

# Option B: Allowlist (stricter)
ALLOWED_LANGUAGES = {"python", "python3", "ruby", "node", "bash", "sh"}
if language not in ALLOWED_LANGUAGES:
    raise ValueError(f"Unsupported language: {language}")

Finding 2: Long output lines crash `readline()`

Category: Bug
Severity: High

Reproduction:

sandbox = LocalSandbox(working_dir="/tmp")
await sandbox._execute_to_result("echo " + "A" * 100000)
# Raises: ValueError: Separator is found, but chunk is longer than limit

Observed behavior: asyncio.StreamReader.readline() has a default buffer limit of 64KB. When a command outputs a single line exceeding this, readline() raises ValueError('Separator is found, but chunk is longer than limit'). This propagates as an unhandled exception.

Expected behavior: The sandbox should handle long output lines gracefully. Either increase the buffer limit when creating the subprocess, or catch ValueError/LimitOverrunError in _read_stream.

Fix: Set a larger limit when creating the subprocess:

proc = await asyncio.create_subprocess_shell(
    command,
    cwd=self.working_dir,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    limit=1024 * 1024,  # 1MB line limit
)

Note: asyncio.create_subprocess_shell accepts a limit parameter that's passed to the StreamReader.

Finding 3: Non-existent working directory crashes

Category: Bug
Severity: Medium

Reproduction:

sandbox = LocalSandbox(working_dir="/nonexistent/path")
await sandbox._execute_to_result("echo hello")
# Raises: FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/path'

Observed behavior: asyncio.create_subprocess_shell(cwd="/nonexistent") raises FileNotFoundError which propagates unhandled.

Expected behavior: Either validate the working directory in __init__/start(), or catch the exception in execute() and return an ExecutionResult with a non-zero exit code and descriptive stderr.

Finding 4: `proc.returncode or 0` masks None as 0

Category: Code Smell
Severity: Low

Location: local.py:115, docker.py:131, docker.py:262

Issue: The pattern proc.returncode or 0 uses Python's falsy semantics. If returncode is None (process not finished), it maps to 0 (success). The correct pattern is 0 if proc.returncode is None else proc.returncode.

Risk: Low in practice (after proc.wait(), returncode should never be None), but the code is semantically incorrect and fragile.

Other observations (non-findings)

✅ Heredoc injection: Base class write_file uses randomized STRANDS_EOF_{hex} delimiter — resistant to injection. LocalSandbox overrides with native file I/O (even better).

✅ DockerSandbox stdin piping: write_file uses stdin pipe instead of heredoc — immune to shell metacharacter injection.

✅ shlex.quote() on file paths: All path arguments are properly quoted.

✅ Concurrent execution: Multiple concurrent execute() calls work correctly.

✅ Context manager cleanup: Exception during async with properly calls stop().

✅ Docker container cleanup: stop() handles rm failures gracefully.

✅ Timeout process killing: Local and Docker sandboxes properly kill processes on timeout.

⚠️ Path traversal (LocalSandbox): write_file("../escape.txt") writes outside working directory. This is by design for LocalSandbox (host execution), but should be clearly documented as a security boundary. DockerSandbox containment handles this via container isolation.

Artifacts: All adversarial tests provided in the review above.

🤖 AI agent response. Strands Agents. Feedback welcome!

agent-of-mkmeral commented Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Sandbox abstraction for agent code execution environments#1968

feat: add Sandbox abstraction for agent code execution environments#1968
agent-of-mkmeral wants to merge 1 commit intostrands-agents:mainfrom
mkmeral:feat/sandbox-abstraction

agent-of-mkmeral commented Mar 25, 2026

Uh oh!

agent-of-mkmeral left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agent-of-mkmeral commented Mar 25, 2026

Summary

What Changed

New Files (4 source + 5 test)

Modified Files (2)

Design Decisions

✅ IN core SDK

❌ NOT in core SDK (for now)

Key Design Details

Streaming Interface

Agent Integration

Security

Tests

What's Left (Outside This PR)

Uh oh!

agent-of-mkmeral left a comment

Choose a reason for hiding this comment

Finding 1: execute_code() language parameter injection

Finding 2: Long output lines crash readline()

Finding 3: Non-existent working directory crashes

Finding 4: proc.returncode or 0 masks None as 0

Other observations (non-findings)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Finding 1: `execute_code()` language parameter injection

Finding 2: Long output lines crash `readline()`

Finding 4: `proc.returncode or 0` masks None as 0